AITopics | bias mitigation

Collaborating Authors

bias mitigation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Empirical Survey of Model Merging Algorithms for Social Bias Mitigation

Shirafuji, Daiki, Saito, Tatsuhiko, Kimura, Yasutomo

arXiv.org Artificial IntelligenceDec-3-2025

Large language models (LLMs) are known to inherit and even amplify societal biases present in their pre-training corpora, threatening fairness and social trust. To address this issue, recent work has explored ``editing'' LLM parameters to mitigate social bias with model merging approaches; however, there is no empirical comparison. In this work, we empirically survey seven algorithms: Linear, Karcher Mean, SLERP, NuSLERP, TIES, DELLA, and Nearswap, applying 13 open weight models in the GPT, LLaMA, and Qwen families. We perform a comprehensive evaluation using three bias datasets (BBQ, BOLD, and HONEST) and measure the impact of these techniques on LLM performance in downstream tasks of the SuperGLUE benchmark. We find a trade-off between bias reduction and downstream performance: methods achieving greater bias mitigation degrade accuracy, particularly on tasks requiring reading comprehension and commonsense and causal reasoning. Among the merging algorithms, Linear, SLERP, and Nearswap consistently reduce bias while maintaining overall performance, with SLERP at moderate interpolation weights emerging as the most balanced choice. These results highlight the potential of model merging algorithms for bias mitigation, while indicating that excessive debiasing or inappropriate merging methods may lead to the degradation of important linguistic abilities.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2512.02689

Country:

North America > United States (0.93)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Effect of Enforcing Fairness on Reshaping Explanations in Machine Learning Models

Anderson, Joshua Wolff, Visweswaran, Shyam

arXiv.org Artificial IntelligenceDec-3-2025

Trustworthy machine learning in healthcare requires strong predictive performance, fairness, and explanations. While it is known that improving fairness can affect predictive performance, little is known about how fairness improvements influence explainability, an essential ingredient for clinical trust. Clinicians may hesitate to rely on a model whose explanations shift after fairness constraints are applied. In this study, we examine how enhancing fairness through bias mitigation techniques reshapes Shapley-based feature rankings. We quantify changes in feature importance rankings after applying fairness constraints across three datasets: pediatric urinary tract infection risk, direct anticoagulant bleeding risk, and recidivism risk. We also evaluate multiple model classes on the stability of Shapley-based rankings. We find that increasing model fairness across racial subgroups can significantly alter feature importance rankings, sometimes in different ways across groups. These results highlight the need to jointly consider accuracy, fairness, and explainability in model assessment rather than in isolation.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2512.02265

Country: North America > United States (0.46)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.70)

Add feedback

Evolved SampleWeights for Bias Mitigation: Effectiveness Depends on Optimization Objectives

Saini, Anil K., Hernandez, Jose Guadalupe, Wong, Emily F., Misra, Debanshi, Moore, Jason H.

arXiv.org Artificial IntelligenceNov-27-2025

Machine learning models trained on real-world data may inadvertently make biased predictions that negatively impact marginalized communities. Reweighting is a method that can mitigate such bias in model predictions by assigning a weight to each data point used during model training. In this paper, we compare three methods for generating these weights: (1) evolving them using a Genetic Algorithm (GA), (2) computing them using only dataset characteristics, and (3) assigning equal weights to all data points. Model performance under each strategy was evaluated using paired predictive and fairness metrics, which also served as optimization objectives for the GA during evolution. Specifically, we used two predictive metrics (accuracy and area under the Receiver Operating Characteristic curve) and two fairness metrics (demographic parity difference and subgroup false negative fairness). Using experiments on eleven publicly available datasets (including two medical datasets), we show that evolved sample weights can produce models that achieve better trade-offs between fairness and predictive performance than alternative weighting methods. However, the magnitude of these benefits depends strongly on the choice of optimization objectives. Our experiments reveal that optimizing with accuracy and demographic parity difference metrics yields the largest number of datasets for which evolved weights are significantly better than other weighting strategies in optimizing both objectives.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.20909

Country: North America > United States > California > Los Angeles County > Los Angeles (0.30)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Patching LLM Like Software: A Lightweight Method for Improving Safety Policy in Large Language Models

Arif, Huzaifa, Murugesan, Keerthiram, Ko, Ching-Yun, Chen, Pin-Yu, Das, Payel, Gittens, Alex

arXiv.org Artificial IntelligenceNov-12-2025

We propose patching for large language models (LLM) like software versions, a lightweight and modular approach for addressing safety vulnerability. While vendors release improved LLM versions, major releases are costly, infrequent, and difficult to tailor to customer needs, leaving released models with known safety gaps. Unlike full-model fine-tuning or major version updates, our method enables rapid remediation by prepending a compact, learnable prefix to an existing model. This "patch" introduces only 0.003% additional parameters, yet reliably steers model behavior toward that of a safer reference model. Across three critical domains--toxicity mitigation, bias reduction, and harmfulness refusal--policy patches achieve safety improvements comparable to next-generation safety-aligned models while preserving fluency. Our results demonstrate that LLMs can be "patched" much like software, offering vendors and practitioners a practical mechanism for distributing scalable, efficient, and composable safety updates between major model releases. Large language models (LLMs) have achieved remarkable advances in reasoning, generation, and multilingual capabilities (Brown et al., 2020; Wei et al., 2022; Conneau & Lample, 2019). Despite their impressive capabilities, they continue to exhibit serious safety concerns, such as the generation of toxic language (Gehman et al., 2020), biased associations that reinforce stereotypes (Dong et al., 2024), and the production of harmful or dangerous content (Mazeika et al., 2024). Addressing these risks is crucial to the broader challenge of alignment, where models are refined to better align with human values and expectations. Conventional approaches to improving safety rely on alignment techniques such as Reinforcement Learning from Human Feedback (RLHF) (Christiano et al., 2017; Bai et al., 2022; Ouyang et al., 2022), preference-based fine-tuning (Rafailov et al., 2023), domain-specific supervised fine-tuning (Li et al., 2024), etc. While these methods have proven effective, they require substantial computational resources, large-scale data curation, and careful model retraining. In practice, model providers (vendors) often release major updates to their models (major versions) on a fixed schedule, typically once or twice a year.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.08484

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.86)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automatic Assessment of Students' Classroom Engagement with Bias Mitigated Multi-task Model

Thiering, James, Krishna, Tarun Sethupat Radha, Zelkin, Dylan, Biswas, Ashis Kumer

arXiv.org Artificial IntelligenceOct-28-2025

With the rise of online and virtual learning, monitoring and enhancing student engagement have become an important aspect of effective education. Traditional methods of assessing a student's involvement might not be applicable directly to virtual environments. In this study, we focused on this problem and addressed the need to develop an automated system to detect student engagement levels during online learning. We proposed a novel training method which can discourage a model from leveraging sensitive features like gender for its predictions. The proposed method offers benefits not only in the enforcement of ethical standards, but also to enhance interpretability of the model predictions. We applied an attribute-orthogonal regularization technique to a split-model classifier, which uses multiple transfer learning strategies to achieve effective results in reducing disparity in the distribution of prediction for sensitivity groups from a Pearson correlation coefficient of 0.897 for the unmitigated model, to 0.999 for the mitigated model. The source code for this project is available on https://github.com/ashiskb/elearning-engagement-study .

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.22057

Country: North America > United States > Colorado > Denver County > Denver (0.15)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.68)

Add feedback

ConQuER: Modular Architectures for Control and Bias Mitigation in IQP Quantum Generative Models

Zou, Xiaocheng, Duan, Shijin, Fleming, Charles, Liu, Gaowen, Kompella, Ramana Rao, Ren, Shaolei, Xu, Xiaolin

arXiv.org Artificial IntelligenceOct-14-2025

Quantum generative models based on instantaneous quantum polynomial (IQP) circuits show great promise in learning complex distributions while maintaining classical train-ability. However, current implementations suffer from two key limitations: lack of controllability over generated outputs and severe generation bias towards certain expected patterns. We present a Controllable Quantum Generative Framework, ConQuER, which addresses both challenges through a modular circuit architecture. ConQuER embeds a lightweight controller circuit that can be directly combined with pre-trained IQP circuits to precisely control the output distribution without full retraining. Leveraging the advantages of IQP, our scheme enables precise control over properties such as the Hamming Weight distribution with minimal parameter and gate overhead. In addition, inspired by the controller design, we extend this modular approach through data-driven optimization to embed implicit control paths in the underlying IQP architecture, significantly reducing generation bias on structured datasets. ConQuER retains efficient classical training properties and high scalability.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.22551

Country:

North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.63)

Add feedback

LLM Bias Detection and Mitigation through the Lens of Desired Distributions

Shrestha, Ingroj, Srinivasan, Padmini

arXiv.org Artificial IntelligenceOct-9-2025

Although prior work on bias mitigation has focused on promoting social equality and demographic parity, less attention has been given to aligning LLM's outputs to desired distributions. For example, we might want to align a model with real-world distributions to support factual grounding. Thus, we define bias as deviation from a desired distribution, which may be an equal or real-world distribution, depending on application goals. We propose a weighted adaptive loss based fine-tuning method that aligns LLM's gender-profession output distribution with the desired distribution, while preserving language modeling capability. Using 3 profession sets -- male-dominated, female-dominated, and gender-balanced -- derived from U.S. labor statistics (2024), we assess both our adaptive method for reflecting reality and a non-adaptive variant for equality. Across three masked language models, bias is observed under both distributions. We achieve near-complete mitigation under equality and 30-75% reduction under real-world settings. Autoregressive LLMs show no bias under equality but notable bias under real-world settings, with the Llama Instruct models (3.2-3B, 3.1-8B) achieving a 50-62% reduction.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.06354

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Washington > King County > Seattle (0.14)
(14 more...)

Genre: Research Report (1.00)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Open-DeBias: Toward Mitigating Open-Set Bias in Language Models

Rani, Arti, Singh, Shweta, Sahoo, Nihar Ranjan, Nayak, Gaurav Kumar

arXiv.org Artificial IntelligenceSep-30-2025

Large Language Models (LLMs) have achieved remarkable success on question answering (QA) tasks, yet they often encode harmful biases that compromise fairness and trustworthiness. Most existing bias mitigation approaches are restricted to predefined categories, limiting their ability to address novel or context-specific emergent biases. To bridge this gap, we tackle the novel problem of open-set bias detection and mitigation in text-based QA. We introduce OpenBiasBench, a comprehensive benchmark designed to evaluate biases across a wide range of categories and subgroups, encompassing both known and previously unseen biases. Additionally, we propose Open-DeBias, a novel, data-efficient, and parameter-efficient debiasing method that leverages adapter modules to mitigate existing social and stereotypical biases while generalizing to unseen ones. Compared to the state-of-the-art BMBI method, Open-DeBias improves QA accuracy on BBQ dataset by nearly $48\%$ on ambiguous subsets and $6\%$ on disambiguated ones, using adapters fine-tuned on just a small fraction of the training data. Remarkably, the same adapters, in a zero-shot transfer to Korean BBQ, achieve $84\%$ accuracy, demonstrating robust language-agnostic generalization. Through extensive evaluation, we also validate the effectiveness of Open-DeBias across a broad range of NLP tasks, including StereoSet and CrowS-Pairs, highlighting its robustness, multilingual strength, and suitability for general-purpose, open-domain bias mitigation. The project page is available at: https://sites.google.com/view/open-debias25

category, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.23805

Country:

North America > Mexico (0.28)
North America > United States (0.28)
Europe > Switzerland (0.28)
Asia > Middle East (0.28)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Education (0.68)
Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Steering Towards Fairness: Mitigating Political Bias in LLMs

Nadeem, Afrozah, Dras, Mark, Naseem, Usman

arXiv.org Artificial IntelligenceSep-23-2025

Recent advancements in large language models (LLMs) have enabled their widespread use across diverse real-world applications. However, concerns remain about their tendency to encode and reproduce ideological biases along political and economic dimensions. In this paper, we employ a framework for probing and mitigating such biases in decoder-based LLMs through analysis of internal model representations. Grounded in the Political Compass Test (PCT), this method uses contrastive pairs to extract and compare hidden layer activations from models like Mistral and DeepSeek. We introduce a comprehensive activation extraction pipeline capable of layer-wise analysis across multiple ideological axes, revealing meaningful disparities linked to political framing. Our results show that decoder LLMs systematically encode representational bias across layers, which can be leveraged for effective steering vector-based mitigation. This work provides new insights into how political bias is encoded in LLMs and offers a principled approach to debiasing beyond surface-level output interventions.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.08846

Country: Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization

Gupta, Mukur, Varimalla, Nikhil Reddy, Deas, Nicholas, Subbiah, Melanie, McKeown, Kathleen

arXiv.org Artificial IntelligenceSep-23-2025

Large Language Models (LLMs) have achieved impressive performance in text summarization and are increasingly deployed in real-world applications. However, these systems often inherit associative and framing biases from pre-training data, leading to inappropriate or unfair outputs in downstream tasks. In this work, we present AdvSumm (Adversarial Summarization), a domain-agnostic training framework designed to mitigate bias in text summarization through improved generalization. Inspired by adversarial robustness, AdvSumm introduces a novel Perturber component that applies gradient-guided perturbations at the embedding level of Sequence-to-Sequence models, enhancing the model's robustness to input variations. We empirically demonstrate that AdvSumm effectively reduces different types of bias in summarization-specifically, name-nationality bias and political framing bias-without compromising summarization quality. Compared to standard transformers and data augmentation techniques like back-translation, AdvSumm achieves stronger bias mitigation performance across benchmark datasets.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2506.06273

Country:

Europe (0.93)
North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback